Categories of variables in analysis of genetic diversity in S1 progenies of Psidium guajava

Crossing and developing inbred lines have been promising options for guava breeding programs. The purpose of this study was to evaluate the genetic divergence among genotypes of S1 inbred guava families by means of the Gower’s technique and the Ward-MLM methodology, to verify the correlation and relative contribution of traits, as well as to identify descriptors with minimum efficiency for this species. The experiment was implemented at the Estação Experimental da Ilha Barra do Pomba, in the municipality of Itaocara, RJ, Brazil. A randomized block design with 18 inbred families, three replicates, and ten plants per plot was used for the experimental design. After 19 months from the planting of the experiment, the 61 earliest and most productive genotypes (individual plants) were evaluated. For this purpose, 29 descriptors were evaluated, of which fifteen were qualitative and fourteen, quantitative. The characteristics required to obtain the distance matrix were analyzed based on the Gower algorithm, and a comparative cluster between the dendrograms of the morphoagronomic variables was achieved from this matrix. Lastly, the Ward-MLM procedure was applied to form the clusters of inbred families. By using all 29 descriptors, greater efficiency was achieved in cluster discrimination. Hence, according to the results identified, it is not possible to indicate minimum descriptors for the culture. Using the Ward-MLM method, the descriptors that most contributed to the divergence among the genotypes were fruit flesh mass, fruit weight, fruit diameter, fruit flesh thickness, fruit placental mass, and fruit length. The most divergent genotypes can be recommended for further crosses or self-pollinations to develop new lines in the guava breeding program of UENF.


Material and methods
Self-pollination and obtaining the evaluated population. The 18 inbred families evaluated here came from populations developed by Pessanha et al. 1 in a pre-breeding study. After molecular analysis, intraspecific crosses of Psidium guajava accesses were performed. These crosses were performed among the seven superior and contrasting parents, found via RAPD molecular analysis, resulting in 17 segregating families. This segregating population of wide genetic variability was subsequently evaluated and selected by Quintal et al. 5 via REML/BLUP, in which the most productive progenies were selected and self-fertilized to originate the 18 inbred families that comprise this experiment ( Table 1).
The study by Quintal et al. 5 aimed to produce new varieties of guava with superior characteristics by the REML/BLUP procedure at individual level. The families selected by the authors can be highlighted by their first positions in the ranking for most of the agronomic traits and fruit quality evaluated.
Notably, guava is considered a mixed reproduction species (autogamous-allogamous) 19,20 and has a selfpollination rate of 60%. Therefore, for being a self-compatible species 21 , inbred populations can be developed, either by self-pollination or crossing between related individuals, without inbreeding depression 22 .
Guava flowers are actinomorphic, hermaphroditic, being a potential factor for autogamy 22 . Faced with this possibility, plants were self-pollinated in the field and obtained by protecting flowers, covering buds before anthesis. The buds were identified and fruits were later protected with a paper bag. After harvesting the fruits, seeds were removed by rubbing them against a steel mesh sieve under running water. The removed seeds were left to dry at room temperature for 48 h, and constantly turned over to provide uniform drying. Seeds from the fruits of self-fertilization were sown in tubettes (three seeds each tubette) and maintained in a greenhouse under polypropylene screens (Sombrite ® 50%) at the Research Support Unit (UAP), which is located in the State University of North Fluminense Darcy Ribeiro, campus of Campos dos Goytacazes-RJ (Brazil). Ambient humidity (greenhouse) was controlled by an automatic misting system activated when the Table 1. Origin and development of S1 families of guava. UENF, Campos dos Goytacazes, RJ, 2022. *Parents of S 1 plants/family: Family/Genotype/Block: most productive genotypes evaluated and selected by Quintal et al. 5 that were inbred and gave rise to S 1 plants. After the development of the seedlings in the greenhouse, they were planted in the field in July 2014 at a 4-m spacing between rows and 1.5-m between plants. Liming, planting and covering fertilizers were carried out according to soil analysis, in accordance with the recommendations of Costa and Costa 24 ; it was also used drip irrigation.
Nineteen months after planting the 540 plants of the experiment, the earliest and most productive plants were evaluated. As a result, 61 (individual plants) of guava trees were evaluated by means of the descriptors defined for the species P. guajava, in line with the International Union for the Protection of New Varieties of Plants-UPOV 25 and described in Campos et al. 2 (Table 3).
Phenotyping. The following morphoagronomic traits were evaluated: (1) Stem-Stem diameter at 10 cm from the soil; branch height; and color of the stem in young buds in the 61 genotypes, as well as five young and fully developed leaves and five fruits per plant; (2) Leaves-Qualitative descriptors in the young leaves were the presence of anthocyanin and its intensity in the staining. In the fully developed leaves, the shape; curvature in the cross section and central vein; green color; base and tip shape were evaluated. Length; width; length/width ratio; and secondary leaf vein spacing were also examined; (3) Fruits-Final stalk shape; skin color; surface texture; calyx cavity diameter in relation to the one of the fruits; flesh color; and external flesh thickness in relation to the core diameter were all evaluated. Quantitative descriptors for fruit were number; fruit weight; fruit placental mass; fruit flesh mass; length; diameter; shape index; and flesh and placental thickness of the fruit.

Analysis of genetic diversity by the Gower's technique and clustering by the UPGMA method.
In the analysis of genetic diversity, multivariate methods were adopted, applying quantitative and qualitative variables. Firstly, the Gower index 26 was used to evaluate the quantitative and qualitative traits. Matrices and later dendrograms were constructed. At first, the main matrix, which contains the 29 descriptors, was constructed and served as a reference for data analysis. Subsequently, the matrices comprising the descriptors of stem; leaves; and fruits were constructed separately. An estimate of the dissimilarity index, ranging from 0 to 1, was generated.
The dissimilarity was obtained by in which:i and j = individuals to be compared regarding characteristic k;p = total number of characteristics; andS ij = contribution of variable k to the total distance. If the variable was qualitative, S ijk considered value 1 when there was positive or negative agreement for characteristic k between individuals i and j; otherwise, when the variable is quantitative, there is in which:R k = the amplitude of variation of variable k, assuming values 0 and 1 or integers between them.
The w ijk value was a weight applied to define individual S ijk's contributions. Under this aspect, when the value of variable k was absent in one or both individuals, w ijk = 0; or, if not, it was 1.
On the basis of the distance matrices generated, individuals were clustered by the Unweighted Pair Group Method with Arithmetic Mean-UPGMA, and the distance matrices with the 29 variables were compared with the distance matrices with fewer variables (leaves and fruits) using the Dendextend package in the R program (http:// www.r-proje ct. org).
Relative importance of traits. As a suggestion for discarding the less informative quantitative descriptors, it was applied a method based on the relative importance of the traits 27 , in which this relative importance was estimated by means of the participation of the components, concerning each of the traits, in the total dissimilarity observed. The analysis was conducted by using the Genes computer program 7 .
Pearson correlation. Pearson correlation was employed to verify the correlation among the quantitative descriptors, considering that the traits that can be discarded must be correlated to others selected. The correlation coefficient significance was examined by the t test. Statistical analyses were carried out with the help of the R Program 28 . www.nature.com/scientificreports/ Analysis of genetic diversity by the ward-MLM method. After the analysis carried out by the Gower Index, the Ward-MLM (Ward-Modified Location Model) method proposed by Franco et al. 12 was used, as reported by Viana and Resende 29 . Next, it was established the ideal number of clusters following the criteria of pseudo-F and pseudo T 2 using the Ward clustering method 30 . Considering the optimal number of clusters, the hierarchical classification was obtained by the Ward method, which gives the parameters required to implement the final step of the MLM model 31 .
Differences among the clusters, the correlation between the variables and canonical variable (CV) were graphically analyzed. These analyses were made using the SAS statistical software 32 . Diagrams were obtained using the Sigma Plot software, version 11.0.

Ethical approval and informed consent. The research does not need ethical standards, because it has no
Human and/or Animal participants. The article meets ethical standards. The research does not involve human participants and animal welfare.

Results and discussions
Comparative analysis and clustering using the UPGMA method. From the dissimilarity matrix generated by the quantitative and qualitative variables, it could be established the discrimination among the genotypes of the S 1 inbred families of Psidium guajava. In the clustering analysis of the 61 genotypes, the UPGMA method enabled the formation of three distinct clusters in which all the 29 morphoagronomic descriptors were considered, indicating genetic diversity among the genotypes under evaluation. Cluster I comprised 42 genotypes; cluster II, 7 genotypes; and cluster III, 12 (Fig. 1).
The purpose of this clustering analysis is to gather genotypes into clusters so that there is homogeneity within the cluster and heterogeneity among them. Among the multivariate analysis methods to quantify genetic diversity, the UPGMA is highlighted for being a technique that uses unweighted means of dissimilarity measurements, which avoids characterizing dissimilarity by extreme values (minimum and maximum) among the genotypes considered 7 .
In the comparative analysis between dendrograms with all descriptors, and dendrograms with only leaf/stem and fruit descriptors, the need for different descriptors to characterize the genetic divergence of Psidium guajava is evident, since both the number of clusters and the arrangement among genotypes were not kept equal in the analyses. The entanglement value, which measures the matching of genotypes between distinct dendrograms, ranges from 0 to 1, in which 0 means fully matching dendrograms and 1, dendrograms without any matching. In this way, in the comparison of the dendrograms, it was seen the greatest entanglement was 0.28 in the dendrogram showing the leaf/stem descriptors, demonstrating divergence in the distribution of the genotypes in the dendrograms (Fig. 1).
Clusters I and II were formed by 51 and 10 genotypes, respectively. When compared with the dendrogram containing only fruit descriptors, it presented entanglement of 0.24 (Fig. 2). The formation of three clusters was also identified, with cluster I consisting of 43 individuals; cluster II, of 13; and cluster III, of 5 genotypes.
In view of the results achieved by the comparative analysis between the dendrogram that includes all 29 descriptors with the dendrogram that only contains leaf/stem or fruit descriptors, it is necessary to use different descriptors to characterize the genetic divergence among the S 1 inbred families of Psidium guajava.
Therefore, there is no possibility of indicating minimum descriptors for the culture given the results found. Similarly, by using the Dendextend package when comparing the dendrograms, with all the descriptors and the dendrogram containing only flower, leaf, or fruit descriptors, Santos et al. 33 made it clear that different descriptors are required to characterize genetic diversity in Passiflora, because both the number of clusters and the arrangement between genotypes did not remain the same.
Relative importance of traits. Table 2 depicts the relative contribution of the traits evaluated for diversity (Sj) and their percentage values, which constitute the measure of the relative importance of variable j for the study of genetic diversity. Following the Singh method 27 , adopted to evaluate the relative contribution of the 14 quantitative traits, it was determined that three of these traits contributed to the genetic divergence with 98.80%, while the others contributed with only 1.20%. The traits with the highest relative contribution values and which contributed most to the genotype differentiation were fruit weight (91.95%); placental weight (3.68%); and fruit flesh mass (3.15%). As such, under this criterion, it can be confirmed that these descriptors are important in the characterization of the guava genotypes evaluated, since they contribute significantly (more than 1.0% of the total variation) to the divergence discrimination.
As the fruit shape; leaf length/width ratio; and secondary leaf vein spacing traits did not provide relative contribution, they can be recommended for disposal. The other variables were poorly informative to evaluate the genetic dissimilarity of guava trees, considering that they had estimates of relative contribution of small magnitudes. In the case of some descriptors, however, low variability can be explained by the first generation of selffertilization made, which provides an increase in homozygosity and a decrease in heterozygosity in the progeny, allowing, in this manner, the achievement of more homogeneous genotypes and a consequent allele fixation 34 . Pearson correlation. Estimates of phenotypic correlation values, resulting from the evaluation of 61 guava genotypes, are depicted in Fig. 3. To explain the relationships between the traits of economic importance, correlation estimates should be considered satisfactory, that is, r > ± 0.50 35  www.nature.com/scientificreports/ These correlations with fruit weight were expected, considering that genotypes with heavier fruits typically produce wider fruits with, consequently, larger flesh mass. Hence, these correlation estimates allow forecasting the response of a trait when performing the selection in another correlated one, in other words, it enables, for example, the selection in a trait of easy measurement to obtain gains in another of difficult measurement 36 . Clustering analysis by means of the Ward-MLM strategy. In accordance with the Ward-MLM strategy, two clusters were formed following the criteria of pseudo-F and pseudo t 2 (Fig. 4). The ideal number of clusters where there was the greatest increase in logarithmic function was verified, and the highest absolute value of 41.98 was confirmed. The indication of the number of clusters is an innovative aspect of the Ward-MLM procedure, in comparison with other hierarchical methods, which results in a more precise and less subjective cluster formation 13,37 .
The study conducted by Campos et al. 2 with 138 guava genotypes obtained from biparental controlled crossings determined 8 the ideal number of clusters, with an increment value of 67.51. The authors noted a considerable degree of heterozygosity and wide genetic variability in these genotypes for being a segregating population. As such, it is worth mentioning that the results are different from the ones in this study, in which only the formation of two clusters was verified. Such fact seen among the genotypes of the S 1 families is due to the reduction of genetic variability within the families and, consequently, to the increase in the frequencies of homozygosity in the first generation of self-fertilization.
Clusters I and II were formed by 33 and 28 genotypes, respectively. Genotypes were classified into the 49 phenotypic classes of the 15 qualitative descriptors evaluated, demonstrating a low genetic variability among the     For the predominant leaf color, green gray was found in 47.5% of the genotypes, followed by green (26.2%); yellow green (23%); and dark green (3.3%).
In relation to fruit descriptors, the genotypes presented final truncated stalk shape (54.1%); rounded (26.2%); broadly rounded (18.1%); and neck-shaped (1.6%). In terms of skin texture, 93.4% of the genotypes showed a soft texture and 6.6% a rough one. On the other hand, 59% of the genotypes had pale yellow skin color; 23%, pale green-yellow; 14.7%, dark yellow; and 3.3%, orange. Pereira and Nachtigal 38 pointed out that fruits of yellowish green or yellow color, when ripe, are more commercially appreciated.
For the flesh color descriptor, genotypes were classified into five categories, with a range from cream to orange pink, but with a predominance of dark pink, which was found in 37.7% of the genotypes. Pereira and Nachtigal 38 say that pink or reddish flesh fruits are preferred not only for the industry but also for the consumption in natura in the Brazilian market.
In the calyx cavity diameter regarding the fruit, 63.9% of the genotypes were classified as medium (68.8%); thick (21.3%); thin (6.6%); and very thin (3.3%). These descriptors prove to be relevant when choosing genotypes for consumption in natura, given that they are related to factors that depreciate the product if they are not attractive to consumers.
With respect to the quantitative descriptors, cluster II had the highest number of fruits per plant (125), followed by cluster I (111) ( Table 4). In both, however, there was great variation, as seen by the minimum and maximum values of each group.
There is still less potential for production when comparing with the Paluma cultivar, which produces an average of 188 fruits per plant. Nevertheless, as the evaluation was carried out in the first year of cultivation, this was already expected. As stated by Natale et al. 39 , guava trees bear fruits from the first year onwards and, over the course of the harvest, production gradually increases until it stabilizes. Even though the number of fruits produced by some genotypes does not come close to being found in some cultivars, the families in this study are in their first harvests with the possibility of an increase in production over the cycles.
Fruit weight trait also varied within the clusters, with cluster II having the highest value (329 g). There was high magnitudes of fruit weight, reflecting the first production, when the plants still have not expressed their full productive potential and thus produce larger fruits. In general, the trend is towards the fruit mass decreasing with the stabilization of commercial production, when the plant tends to yield a greater number of fruits. Gonzaga Neto et al. 40 consider the increase in the average weight of the fruits to be related to the number of fruits produced by each plant; as such, the greater quantity of fruits in the plant may influence it to produce smaller fruits in weight and size because the available reserves would be used to fill a greater number of fruits, limiting the size of each one of them.
Cluster I, however, had an average fruit weight of 177.2 g. Hence, these values are in accordance with the standard recommended for consumption because, according to Lima et al. 41 fruits ranging from 100 to 200 g can be utilized for dual purposes, in other words, they can be used both for industrial processing and for consumption in natura. www.nature.com/scientificreports/ Cluster II also showed a higher value for the fruit flesh mass (242 g) and fruit placental mass (47.9 g), resulting in a greater fruit length (101 mm) and fruit diameter (82.48 mm). Flesh mass trait is of utmost relevance, once it contributes to the definition of the flesh yield. According to Gonzaga Neto et al. 40 , the average fruit mass is an important trait, given that, in general, fruits with the greatest flesh mass are also the largest and, in turn, they are more attractive to the consumer.
For fruit shape index, values were close among the clusters, in which the mean was of 1.14 e 1.09 for cluster 1 and 2 respectively. The analysis of these variables, separately, has little importance for the fruit characterization of guava tree genotypes 41 . On the other hand, the relationship among these variables is highly representative, since it indicates the fruit shape. Pyriform or oval fruits (LD/CD ratio greater than 1) are suitable for consumption in natura, and those with rounded forms (LD/CD ratio close to 1) are better indicated for industrialization 42 .
The clusters presented close means between them for placental thickness and flesh thickness of the fruit, with mean values of 39.35 and 12.96 for cluster 1, and 14.77 and 41.77 for cluster 2 respectively. For the stem diameter at 10 cm from the soil, cluster I reported a mean of 49.56 mm followed by cluster II, with 53.12 mm. For descriptors related to the leaf, clusters I and II were similar for length, width, length/width ratio, and vein spacing, which resulted in a lower mean length/width ratio of (1.89 and 1.94), presenting the characteristic of a rounder leaf. For being correlated with the leaf area of the plant, the leaf length and leaf width are significant. The leaf area is directly related to the taking advantage of solar energy, which is converted into chemical energy during the process of photosynthesis 43 .
The first two canonical variables (CV) obtained by the Ward-MLM strategy explained 100% of the total variation, being 83% of CV1 and 17% of CV2 (Fig. 5). As such, a two-dimensional representation is the most appropriate to represent the data set. The Ward-MLM procedure was applied to quantify genetic variability in studies on maize 14 ; pepper/sweet pepper 15 ; tomato 13 ; common beans 16 ; pepper/sweet pepper 17 ; and banana 18 . These authors noticed the first two canonical variables explained the variability between the clusters above 80%, and the two-dimensional graph was adequate to visualize the relationship between the clusters. In this way, this high value shows the graphic representation of the first two canonical variables is adequate to verify the relationship among clusters and individuals within a cluster, enhancing the reliability of the results.
The quantitative descriptors that most contributed to quantify the genetic diversity among the genotypes, in other words, the ones that displayed the greatest correlations with the first canonical variable were fruit flesh mass (0.549); fruit weight (0.533); fruit diameter (0.515); fruit flesh thickness (0.462); fruit placental mass (0.455); and fruit length (0.295) ( Table 5). In the study developed by Campos et al. 2 , with guava accesses, the authors found that the greatest correlations of the variables with the first canonical variable were flesh mass; fruit mass; fruit diameter; mean fruit weight; mesocarp thickness; and fresh placental mass.
In view of the above, it is worth emphasizing that knowing the morphoagronomic traits evaluated herein is fundamental in plant breeding programs, especially to support future crosses and/or self-fertilization of the genetic breeding program of guava trees. By characterizing the genetic diversity of the S 1 inbred families of guava tree, the identification of genotypes agronomically superior was made possible, providing better gains as a result of their selection. This study provided a more accurate basis for choosing the best genotypes that can be selected and self-fertilized, so as to achieve the most promising genotypes within the breeding program.

Final considerations
Comparative analysis between dendrograms revealed the use of all 29 descriptors led to greater efficiency in discriminating clusters, enabling genetic dissimilarity to be achieved more efficiently among the S 1 inbred families of guava trees. www.nature.com/scientificreports/ The Gower's technique is an effective tool to detect genetic divergence and to cluster accesses using qualitative and quantitative variables simultaneously.
The traits with the greatest relative contribution to genetic variability in the families studied were fruit weight; placental weight; and fruit flesh mass.
The Ward-MLM statistical procedure is a useful tool in detecting genetic divergence and in clustering genotypes, and, by means of it, the descriptors that most contributed to the divergence among the guava genotypes were fruit flesh mass; fruit weight; fruit diameter; fruit flesh thickness; fruit placental mass; and fruit length.
The most promising genotypes for selection in the breeding program belong to group II, due to their good performance for all traits, especially fruit number, fruit weight, fruit pulp mass, and fruit diameter and length.
Therefore, the most divergent genotypes with high production potential can be recommended for further crosses or self-fertilization to acquire new lines in the guava breeding program at UENF.